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Abstract 

Natural language generation systems (NLG) map non-linguistic representations into strings of words through a number 
of steps using intermediate representations of various levels of abstraction. Template based systems, by contrast, tend to 
use only one representation level, i.e. fixed strings, which are combined, possibly in a sophisticated way, to generate the 
final text. 

In some circumstances, it may be profitable to combine NLG and template based techniques. The issue of combining 
generation techniques can be seen in more abstract terms as the issue of mixing levels of representation of different 
degrees of linguistic abstraction. This paper aims at defining a reference architecture for systems using mixed repre- 
sentations. We argue that mixed representations can be used without abandoning a linguistically grounded approach to 
language generation. 



1 Introduction 



2 NLG and templates 



Natural language generation systems (NLG) map non- 
linguistic representations into strings of words through 
a number of steps using intermediate representations of 
various levels of abstraction. Template based systems, 
by contrast, tend to use only one representation level, i.e. 
fixed strings, which are combined, possibly in a sophisti- 
cated way, to generate the final text. 

In some circumstances, it may be profitable to com- 
bine NLG and template based techniques. The issue of 
combining generation techniques can be seen in more ab- 
stract terms as the issue of mixing levels of representa- 
tion of different degrees of linguistic abstraction. This pa- 
per aims at defining a reference architecture for systems 
using mixed representations. We argue that using tem- 
plates does not necessarily means abndoning a linguisti- 
cally grounded approach to language generation. 

The rest of this paper is organised as follows: in sec- 
tion ^] we introduce briefly the NLG and the template 
based approaches to text generation. Section |3| offers a 
theoretical framework within which the hybrid approch 
combining these two strategies can be analysed. We also 
review a number of existing systems from the point of 
view of the proposed framework. Finally, section Q presents 
a declarative formalism that allows us to represent hybrid 
objects characterised by different degrees of linguistic ab- 
straction. 



The strategy to perform automatic text generation called 
Natural Language Generation par excellence, or also Deep 
Generation, is characterised by the fact that it relies on 
conceptual models of language developed in current lin- 
guistic theories. Systems following this strategy are based 
on concepts such as morpheme, word, sentence, semantic 
or syntactic representation, communicative intention, etc. 
Each of these objects pertains to a level of linguistic anal- 
ysis that has specific rules and specific ways to represent 
information. 

Current architectures of NLG systems are made of 
an ordered sequence of components consisting of Text 
Planner, Sentence Planner and Linguistic Realiser (Re- 
iter, 1994; |Reiter and Dale, 1991\ |Paiva, 199$ Cahill and 
Reape, 1998). Each component expects an input and pro- 
duces an output pertaining to a certain level of linguistic 
analysis. More specifically: 

• the input of the Text Planner is some information 
represented in a non-linguistic formalism, for in- 
stance concepts of a Knowledge Base. The pieces 
of information feeding the generation process are 
usually called Messages. 

• the output of the Text Planner and input of the Sen- 
tence Planner is a Text Plan, that is a tree structure 
where the leaves are Messages and the nodes rep- 
resent semantic relations between subtrees which 
will be eventually realized as text spans. In some 



case semantic relations correspond to rhetorical re- 
lations. 

• the output of the Sentence Planner and input of the 
Linguistic Realiser is a list of Sentence Represen- 
tations. A Sentence Representation includes all the 
semantic and grammatical information necessary to 
generate exactly one correct sentence according to 
the syntactic, morphological and phonological rules 
of a certain language. 

• the ouput of the Linguistic Realiser is a list of strings 
that constitute the generated text. 

The NLG ap proach is usu ally compared with the tem- 
plate based one ( {Reiter, 1995 ). The basic idea of a textual 
template is that of a structure consisting of a sequence of 
fixed strings and gaps filled at processing time with other 
fixed strings. Let's call it a template of the static type. A 
static template can degenerate in canned text if no gaps 
are present. Generation systems based solely on static 
templates are called mail-merge systems. As suggested 
by Reiter and Dale (1997), these systems can reach the 
complexity and flexibility of a programming language, 
and are thus functionally equivalent to NLG systems, at 
least in principle. 

Pros and cons of both approaches have been discussed 
in the literature, see (feeiter and Mellish, 1993 ; Reiter, 
1995; feeiter and Dale, 1997 ; Busemann and Horacek, 
1998). Let us just mention some of them. 

• Pros of NLG: declarativeness, theoretical sound- 
ness, modularity, good portability through different 
domains i.e. reusability, aptness to handle multilin- 
guism. 

• Cons of NLG: low time efficiency, architectural com- 
plexity, linguistic resources are costly to develop 
and require specialized knowledge, patches of poorly 
understood linguistic phenomena, difficulties in in- 
tegrating linguistic content and lay-out information. 

• Pros of static templates: high time efficiency, ar- 
chitectural simplicity, efficient application develop- 
ment, only generic programming skills are required. 

• Cons of static templates: no theoretical grounding, 
procedurality, low portability, multilinguism is awk- 
wardly handled. 

In the last few years, templates have been used also 
within NLG architectures, in order to overcome some of 
the drawbacks of NLG, first of all time inefficiency and 
resource developing cost, feeiter and Mellish (1993| ) in- 
troduce pointers to KB individuals among fixed strings 
and insert canned text as the value of a frame represent- 
ing the meaning of a sentence. Busemann (1996| ) mixes 
templ ates and syntactic representations. Cancedda et al. 
(1997) insert templates as leaf nodes of a textual plan pro- 
duced by classical NLG techniques. These attempts to 



integrate templates within NLG architectures all bring in 
mixed representations (MR), absent from both pure NLG 
and static templates. 

Our approach to text generation aimes at mixing rep- 
resentation levels in a systematic and principled way. From 
a practical perspective, this amounts to using precompiled 
generation knowledge whenever possible, while retain- 
ing the possibility of using a full-fledged NLG approach 
when strictly necessary. 

Let us see how the notion of mixed representation 
compares with the received view about the separation of 
representation levels. Standard NLG architectures map an 
input message to an output text passing through a number 
of intermediate representations, as we have seen. Each 
representation level is the input and/or output of a sepa- 
rate component coping with specific linguistic phenom- 
ena, e.g. communicative intentions, text structure, refer- 
ring expressions, morphology, etc. In the accepted view, 
representation levels should be kept carefully separated, 
on the grounds that separation enhances modularity and 
reflects different levels of linguistic analysis. In the Mixed 
Representation approach both these motivations are chal- 
lenged. 

On the one hand, we argue that representation levels 
can be mixed while preserving the modularity of the lin- 
guistic components. On the other hand, we argue that, 
while the strict separation of representation levels is cru- 
cial when taking a competence point of view on language, 
mixing representations is acceptable in a more performan- 
ce oriented perspective. In practical terms, we consider it 
plausible that human speakers produce discourse by mix- 
ing dynamic planning with precompiled knowledge about 
the structure and te relevance of texts, and produce sen- 
tences by mixing flexible sentence planning and realiza- 
tion with all kind of (semi)idiomatic expressions, (semi- 
fixed descriptions of individuals, precompiled sentence 
patterns, phrases stored in the short-term memory, etc. 

Let us conclude this section by making a termino- 
logical point. Instead of t he opposition between NLG 
and templates introduced by Reiter (1995 ), more recently 



Busemann and Horacek (1998) have proposed a distinc 



tion between in-depth and shallow generation which par- 
allels the distinction between deep and shallow analysis. 
We think that this proposal goes in the right direction for 
two reasons. First, identifying natural language genera- 
tion with deep generation seems misleading and a little 
arbitrary. Second, as we will see in the rest of the paper, 
templates are just one among various shallow generation 
techniques available. 

3 The hybrid approach to automatic 
text generation 

In this section we try to characterize the so called hybrid 
approach to text generation in terms of mixing represen- 
tation levels. Before doing that, we need to single out a 



few more levels of representation. 

We start by distinguishing three components which 
are virtually present in any Linguistic Realiser, that is 
the Sentence Grammar, the Morphological Synthesizer 
and the Phonological Adjustment Component. The lit- 
tle attention paid so far to Morphological and Phonolog- 
ical components may be explained by the fact that many 
of the generation systems described in the literature pro- 
duce texts in English, a language with relatively little in- 
flectional Morphology and a restricted number of Phono- 
logical Adjustment phenomena. In principle, a Sentence 
Grammar can incorporate the other two components, since 
it can produce directly complete words and apply phono- 
logical adjustment rules as soon as a pair of adjacent words 
is available. However, we would rather keep the three 
components apart, in order to broaden the possible range 
of mixed representations, as discussed below. The distinc- 
tions introduced in the Linguistic Realiser implies new 
representation levels. 

• the output of the Sentence Grammar and input of 
the Morphological Synthesizer is a list of morpho- 
logical bundles, i.e. sets of morphological features 
that are mapped onto potential words. 

• the output of the Morphological Synthesizer and 
the input of the Phonological Component is a list 
of potential words, which are word forms that can 
undergo phonological adjustments. 

• the output of the Phonological Component is a list 
of strings. 

We must spend a few words also on the task of for- 
matting, which is often underestimated in the literature. 
More and more often generation systems are expected to 
produce formatted text rather than bare ASCII code. For- 
matting information is taken in the broad sense of tags 
for typographical formatting, pointers to images, hyper- 
textual links, annotations for texts feeding a speech syn- 
thesizer, etc. In order to be effective, formatting deci- 
sions cannot be taken by a component independent from 
the NLG architecture and subsequent to it. They need 
to be taken at early stages of the NLG process, see (Re- 
iter et al., 1995). For instance, if you want to emphasise 
typographically a certain portion of a sentence content, 
you need to take this decision within the Sentence Plan- 
ning component, when the relevant semantic information 
is available. Or, if you want your text to be articulated 
into sections and paragraphs, you need to take decisions 
about these aspects during the Text Planning stage. The 
actual execution of formatting instructions can be left to a 
component that operates after the Linguistic Realiser, but 
the formatting decisions themselves need to be taken at 
the appropriate level of linguistic abstraction. 

The finer grained version of the NLG architecture pro- 
posed here is made of six components: Text Planner, Sen- 
tence Planner, Sentence Grammar, Morphological Syn- 
thesizer, Phonological Component, Formatting Realiser. 



The corresponding representation levels are: ( 1 ,Msg)[] Mes- 
sage, (2,TPlan) Text Plan, (3,SRep) Sentence Representa- 
tion, (4,MBundle) Morphological Bundle, (5,PWord) Po- 
tential Word, (6,Str) String and (7,Frm) Formatting In- 
structions. | 

In a hybrid architecture all levels can be mixed. Mix- 
ing representation levels can be done either by concate- 
nation or by embedding. Concatenation means building 
a list of objects pertaining to different levels. For in- 
stance, one can have a list with the structure [< string >, 
<potential word > , < sentence representation > , < string > ] . 
Mixing by embedding means that an object of a certain 
level is nested inside a structure of a different level. Only 
structured objects can be the locus of an embedding. Fixed 
strings and potential words can be embedded, but one can- 
not embed into them. 

Let us discuss a few cases from the literature in the 
light of the theoretical framework that we are proposing 
for hybri d systems. The generation t echniques proposed 
in IDAS dReiter and Mellish, 1993| ; |Reiter et al„ 1995[ ) 
mix representations levels in two ways. Knowledge base 
references to entities can be embedded into portions of 
canned text, which gives a solution of the type [Msg, Str]. 
They also fill case frame slots with canned text, which 
corresponds to a type [SRep(Str)]. 

Busemann (1'J9^) presents TG/2, a surface generator 
taking as input formulae of the GIL sentence representa- 
tion formalism. This system is based on production rules 
employing canned text, templates and syntactic represen- 
tations. The production rules can contain calls to other 
rules, lines of Lisp code and canned text. The whole pro- 
posal seems to be a solution of the type [Msg, SRep, Str], 
where messagges are picked up through direct Lisp calls. 
Then, Busemann and Horacek (1998) do away with the 
GIL representation interface and replace it with an Inter- 
mediate Representation layer that is made up of domain 
specific conceptual structures. More on this approach in 
section 4.2. 

Geldof and Van de Velde (1997) propose a template 
based system for generating hypertexts. They use tem- 
plates made up of canned text interleaved with "abstract 
terms referring to domain concepts". This type of tem- 
plate corresponds, in their words, to the IDAS' solution 
of type [Msg, Str] mentioned above. Then, there are tem- 
plates with hypertext links. Finally, a text schema is used 
to structure the text. This gives a solution of the type 
[TPlan([Msg, Str,Frm])]. 



'(<level number>,<abbreviation>) are given for easying future 
reference. 

formatting instructions can be introduced at any level, so we men- 
tion level (7,Frm) only as an abstraction. 



4 The Hyper Template Planning Lan- 
guage 

In order to extend the potentiality of the hybrid approach, 
|Canceddaet al. (1997 ) developed a specific representation 
language, Hyper Template Planning Language (HTPL), 
which allows one to mix together MBundle, PWord, Str 
and Frm. We call flexible templates the kind of structures 
that can be built by mixing these representation levels. 
Recall that static templates were defined above as operat- 
ing on fixed strings. 

Then, in ( ^ianta and Tovena, 1998 ) the expressive power 
of HTPL has been extended by adding the possibility to 
mix also Messages and Sentence Representations. Here 
is a list of the linguistic representation levels available in 
the current version of HTPL. 

message representation (l,Msg): a formula in some 
content representation formalism. When specifying a mes- 
sage representation, one should also specify the formal- 
ism and the type of message object which is being de- 
scribed. For instance, rasg (' IF' , attribute, loca- 
tion=pittsburgh) can be used to refer to an attribute- 
value pair of the Interchange Format (IF) representation 
language ( Tovena and Pianta, 1999| ). Message representa- 
tions are handled by a specialized component during the 
interpretation process. In the current version of HTPL, 
messages can by at most proposition level content spec- 
ifications. Thus, the component that handles them is in 
fact a Sentence Planner. 

phrase representation (3,SRep): an abstract represen- 
tation of a phrase which can feed a specific tactical gen- 
erator. For instance, we can specify phrases in terms of 
grammatical functions such as subject, verb, object, ad- 
juncts, determiner etc., in the spirit of LFG. 

phrase (If q, 
[ sub ject = 
[spec=the, num=sing, pred=room] ] ) } . 

morphological bundle (4,MBundle): a set of morpho- 
logical features corresponding to a word form. For exam- 
ple, the bundle mo rpho ( [cat=noun, pred=room, 
num= plur ] ) can be seen as an abstract representation 
of the word form rooms. When used in HTPL expres- 
sions, the values of morphological features can be vari- 
ables: morpho ( [cat=noun, pred=room, num=NUM] ) . 
Morphological variables make it easier to treat agreement 
phenomena, which are awkward to handle with static tem- 
plates. 

potential word (5,PWord): a word form which can un- 
dergo phonological adjustment. We describe a potential 
word by specifying the lexical category of the word and 



its base form: w (noun, albergo) . Sequences of po- 
tential words are mapped onto strings by phonological 
and orthographic rules: for example in Italian [w(prep, 
di) , w (article, il ) , w(noun, albergo)] be- 
comes [ "dell' albergo" ] . The preposition di is 
first combined with the article i 1 yielding the compound 
form del (of the). The latter combines with a noun be- 
ginning with a vowel yielding a contracted word group 
which is orthographically represented as "dell' albergo" 
(of the hotel). 

string (6,Str): a sequence of characters inserted in the 
text without modification, for instance: "hotel reserva- 
tion". 

The representation levels have been listed here follow- 
ing an ordering which is relevant for the HTPL interpreter, 
see below. However the ordering is also meaningful from 
a linguistic point of view. A phrase representation is lin- 
guistically less abstract than a message representation. A 
potential word pertains to a less complex constituency 
level than a phrase representation. A potential word is 
less abstract than a potential word, etc. 

Objects of level 1 through 4 are all parametric, i.e. 
they can contain variables which are instantiated at pro- 
cessing time. This allows the possibility of sharing infor- 
mation between objects of different levels. 

An HTPL expression can include any combination of 
the above representation levels. Both concatenation and 
embedding are possible. Here is an example of concate- 
nation: 

[w(pronoun, 'I'), w(modal, will), 
"arrive at", w(article, the), 
morpho ( [cat=noun, 

pred=airport , 
num=sing] ) , 
msg(' IF' , attribute, time=sunday) ] 



and here is an embedding: 

phrase (If g, 
[ sub ject = 

htpl ( [w (pronoun, ' I ' ) ] ) 
modality=will 
verb= 

htpl ( [ 

"arrive at", 
w (article, the) , 
morpho ( [cat=noun, 

pred=airport , 
num=sing] ) ] ) , 

ad juncts= 
htpl ( [ 

msgflF', attribute, 
time=sunday) ] ) 

] ) 



Both these HTPL expressions correspond to the sen- 
tence I'll arrive at the airport on Sunday. Of course the 
first expression can be realized more efficiently than the 
second, as it doesn't need to be handled by the Senten- 
tence Generator. However the second allows more flex- 
ibility; under certain conditions, the Sentence Generator 
could topicalize the adjunct yielding ON SUNDAY, will I 
arrive at the airport!. Also, note that what can be em- 
bedded is not simply canned text but any legal HTPL ex- 
pression. Embedding is explictly marked by enclosing the 
embedded expression in the scope of the htpl operator. 

formatting (7,Frm): Typographical formatting phenom- 
ena are handled in HTPL by including basic expressions 
in the scope of one ore more formatting operators such 
as: italic, bold etc. Hypertextual links are treated as 
a special class of format instructions. They are specified 
by descriptors which refer to linked documents through 
absolute addresses (file name) or functional expressions, 
evaluated at run time. Pictures are inserted in text through 
the same mechanism. Here follow other HTPL objects. 

slot specifications: slot (<parameters>) . The run- 
time evaluation of a slot specification is expected to yield 
an HTPL expression. 

template definitions: template (<templatedescriptor>, 
<HTPL expression> ) . The <template descriptor> can 
include variables, thus allowing the definition of paramet- 
ric templates. 

control expressions: if .then, if _then_else, or. 
In conditional expressions, an HTPL expression is real- 
ized in the generated text only if some constraint is satis- 
fied. Here is an example of a template definition including 
a conditional expression and a recursive call to other tem- 
plates: 

template (controls (ActID) , 
if_then_else ( 

exist_many_controls (ActID) , 
template ( item_controls (ActID) ) , 
template (coord_controls (ActID) ) ) ) 

Disjunctive expressions give alternative ways of phras- 
ing something, for example: or ( [ "taking into acc- 
ount", "considering"] ) . When the HTPL inter- 
preter finds a disjunctive expression for the first time, it 
chooses one of the alternatives randomly; the second time, 
one of the remaining alternatives is selected, and so on. 
When all alternatives have been used at least once, the 
whole set becomes available again. 

4.1 The HTPL interpreter 

The HTPL interpreter must be able to handle both con- 
catenated and embedded Mixed Representations (MR). 



As for concatenated MRs, the interpreter scans the list of 
objects several times. The first time, it calls the compo- 
nent appropriate for all objects of level (1, Msg) that is a 
Sentence Planner. Then, it passes all the objects of level 
(2, SRep) to the Sentence Grammar, and so on up to ob- 
jects of level (6, Str), which are passed to the Formatting 
Realiser. The latter translates formatting instructions in 
HTML tags. Note that each component related to a cer- 
tain level can produce as output MRs, although of a less 
abstract level. 

Handling embedded MRs is more complex, as it de- 
pends highly on the working of the single components. 
For this reason, solving embedded MRs is the responsi- 
bility of each component, and won't be further discussed 
here. The only generic constraint enforced by the HTPL 
intepreter is that an object of level n should not embed 
objects of higher abstraction levels. 

4.2 Mixed Representations vs Intermediate 
Representations 

In section |]we already analysed various hybrid approaches 
to text generation in terms of the MRs framework. In 
this section we will make some additional comparison be- 
tween our proposal and that in (Busemann and Horacek, 
1998). The two approaches share many practical motiva- 
tions and adopt a number of similar or equivalent techni- 
cal solutions. There is o ne point however that sets wel l 
apart the two approaches, ftusemann and Horacek ( 1 998 ) 
introduce Intermediate Representations, which can be char- 
acterized as language independent but domain dependent 
representations. Notice that the domain dependency holds 
not only at the level of the concepts of the ontology but 
also at the level of the syntax and the intepretation of 
the representation language. In other words, Intermediate 
Representations are very different both from language de- 
pendent grammar representation formalisms such as SPL, 
and from knowledge representation formalisms based on 
a general syntax and a general semantic interpretation mech- 
anism. As the authors themselves suggest, the use of these 
kind of representations seriously undermines the standard 
NLG architecture as it doesn't acknowledge the text anal- 
ysis levels on which that architecture is based. We think 
that the notion of Mixed Representation does not have the 
same reflexes on the NLG architecture. In our proposal, 
all analysis levels are kept, indeed some more are made 
explicit. What is different with respect to the NLG archi- 
tecture is the possibility to introduce precompiled knowl- 
edge at any stage of the generation process. This, from 
a processing point of view, corresponds to the ability to 
skip unnecessary intermediate processing stages. 

5 Conclusion 

In this paper we discussed the hybrid approach to auto- 
matic text generation. The concept of mixed linguistic 



representation turned out to be a core notion for build- 
ing a theoretical framework within which to represent dif- 
ferent attempts to combine NLG and template based ap- 
proaches. This conceptual framework led us to propose a 
more detailed version of the standard NLG architecture 
and hence new types of mixed representations. These 
ideas were implemented in HTPL, which has been suc- 
cessfully used in two applicative projects, see (Cancedda 
etal., 1997; iPianta and Tovena, 19981). 
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